Skip to main content

All Questions

0votes
0answers
20views

Python - adding more timesteps makes my model "fail"

Hi! I have just made my first model in stable-baselines3 using pygame in Python. The game is about a ball reaching the highest platform out of three placed in the sky. Now - after a few days of trying ...
Skorejen's user avatar
0votes
1answer
164views

Reward not improving for a custom environment using PPO

I've been trying to train an agent on a custom environment I implemented with gym where the goal is to resolve voltage violations in a power grid by adjusting the active power (loads) at each node. I ...
W8_4_it's user avatar
1vote
1answer
87views

Deep RL problem: Loss decreases but agent doesn't learn

I'm implementing a basic Vanilla Policy Gradient algorithm for the CartPole-v1 gymnasium environment, and I don't know what I'm doing wrong. No matter what I try, during the training loop the loss ...
wildBass's user avatar
1vote
0answers
22views

Optimizing Wind Park Layout Using Direct Action-to-Input Mapping

I’m optimizing a black-box objective function where the task is to find the optimal turbine locations in a wind park. Previously, I used a PPO reinforcement learning approach with a step-by-step ...
Shahriar's user avatar
1vote
1answer
156views

How Do I Optimise a Black-Box Objective Function with DQN Using Reinforcement Learning?

I'm a beginner in the field of reinforcement learning, and I'm currently working on a problem that has me a bit stuck. I'm trying to optimize a black-box objective function using reinforcement ...
Shahriar's user avatar
0votes
1answer
289views

Why is PPO not choosing a solution that is giving a higher cumulative reward?

I use PPO to train my fermenter (digital twin) to maximize enzyme (product) production. action: 1 or 0 ie. add substrate at a particular time or not based on cell and enzymes present in the tank ...
user79474's user avatar
1vote
0answers
177views

Python libraries for mulit-armed bandit problems [closed]

I am working on a problem that can be casted as a contextual bandit problem with continuous action space. I would like to tackle it by using something like the contextual zooming algorithm from the ...
Onil90's user avatar
2votes
1answer
67views

How does reward work while training a Reinforcement Learning agent?

I am using PPO to train my environment which I created using stable baselines 3. I am confused if I should make the reward = 0 in the step function or not. Initially, I used to have self.reward = 0 in ...
user79474's user avatar
1vote
1answer
158views

Why are these two implementations of the $\epsilon$-greedy policy different?

According to the book Reinforcement Learning An Introduction, the epsilon greedy policy can generally implemented as: $$ \pi(a|s) = \begin{cases} \frac{\epsilon}{|A|} + 1 - \epsilon & \text{if } ...
kklaw's user avatar
1vote
1answer
121views

RL agent for autonomous vehicle is able to follow the road but can't avoid crashing at all (Highway-Env / Racetrack Env.)

I coded some deep RL algorithms (DQN and SAC) with tf2/keras to solve an environment where a vehicle needs to follow the track and avoid crashing into one other vehicle (there is only one other ...
rafiqollective's user avatar
1vote
1answer
478views

Getting always the same action on an A2C from stable_baselines3

I'm quite new to RL and have been trying to train an A2C model from stable_baselines3 to derive an integer sequence based on 3 other input sequences of floats. I have a custom gym environment that ...
Jesuspc's user avatar
1vote
1answer
603views

What is the problem in my implementation of actor critic?

I have been implementing both REINFORCE with baseline and actor-critic to solve "cartpole-v1". As a reminder, here is the presentation of the algorithms in Sutton and Barto's book (http://...
Labo's user avatar
  • 121
1vote
1answer
443views

OpeanAI Gym. Train problem: invalid values [closed]

I have a problem with my reinforcement learning model. I am trying to simulate an electric battery storage. To keep it as simple as possible, the efficiency of charge, storage and discharge are 100%. ...
MiPre's user avatar
3votes
0answers
152views

Are there Reinforcement Learning algorithms specialized for the case $\gamma=0$?

I have a Reinforcement Learning problem where the optimal policy does not depend on the next state (ie gamma equals 0). I think this means that I only need an efficient exploration algorithm coupled ...
AJSV's user avatar
0votes
1answer
99views

What would the "state space" and its Python implementation be for my simulation?

Context I'm trying to build a social-consensus simulation involving two intelligent agents. The simulation involves a graph/network of nodes. Nearly all of these nodes (> 90%) will be green agents. ...
The Pointer's user avatar

153050per page
close